[https://nvbugs/5955188][fix] Fix harmony parsers for agentic coding use cases by dongfengy · Pull Request #12045 · NVIDIA/TensorRT-LLM

dongfengy · 2026-03-09T17:04:59Z

Summary by CodeRabbit

New Features
- Enhanced tool call handling during streaming with improved validation and filtering logic
- Reasoning content now preserved and propagated through streamed responses
Improvements
- Improved end-of-stream processing for better consistency
- Better parsing and validation of tool call recipients

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-03-09T17:14:55Z

📝 Walkthrough

Walkthrough

This change adds comprehensive tool call handling to the harmony adapter's token streaming logic, including detection, validation, and argument accumulation across analysis and commentary channels. It enhances harmony-to-OpenAI format conversion with reasoning_content support, improves tool call recipient parsing with JSON validation, and refines end-of-stream processing.

Changes

Cohort / File(s)	Summary
Tool Call Handling & Streaming `tensorrt_llm/serve/harmony_adapter.py`	Introduces tool call detection and processing during token streaming with channel state management, argument accumulation across deltas, centralized filtering logic, and support for both function-style and non-function tool invocations. Enhances harmony-to-OpenAI mapping with reasoning_content threading through streaming paths. Improves tool call recipient parsing for constrained JSON payloads with validation. Refines end-of-stream handling to flush remaining tokens and maintains consistency across response construction and delta emission.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is a template with empty sections; no actual description, test coverage, or meaningful checklist completion provided.	Fill in the Description section explaining the issue and solution, list relevant tests in Test Coverage, and complete the PR Checklist with specific details relevant to the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is specific and directly related to the main changes in the pull request, which focus on fixing harmony parsers to handle tool calls and agentic features for coding use cases.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/harmony_adapter.py`:
- Around line 1613-1615: The final-chunk path skips emitting the initial
assistant-role SSE chunk when output.token_ids_diff is empty; update the logic
around harmony_adapter.create_openai_streaming_response so it always emits the
assistant-role/event chunk prior to sending finish/usage chunks even if
token_ids_diff is empty—i.e., factor out or call the same assistant-chunk
emission used in the non-done path before checking output.token_ids_diff,
ensuring the code paths at the blocks around output.token_ids_diff (the call to
harmony_adapter.create_openai_streaming_response and the subsequent finish/usage
sending blocks) both produce the initial assistant-role SSE event.
- Around line 1615-1620: The code calls
harmony_adapter.create_openai_streaming_response(...) and then unconditionally
appends a terminal final_response when done=True, which can duplicate a stop
chunk if create_openai_streaming_response already produced one; update the logic
around remaining_responses and final_response in the done=True branches (the
occurrences that use remaining_responses and final_response) to short-circuit:
check whether remaining_responses already contains a terminal/stop chunk (the
same marker your streaming responses use) and if so do not append another
final_response, and apply the same fix to the duplicate block later (the similar
code around lines 1632-1649) to avoid emitting duplicate terminal stop chunks.
- Around line 214-219: The current filtering in should_filter_tools + the other
site (and harmony_output_to_openai) wrongly blocks built-in tools like
"browser", "python", and "container" when external tools are present; implement
a single helper (e.g., is_tool_allowed(request_id, func_name, tool_choice,
available_tools)) and use it in both filter locations so that: if tool_choice ==
"none" it returns false for everything; if tool_choice != "none" it always
returns true for built-ins ("browser", "python", "container"); and for
non-built-in functions it checks membership against available_tools (and
respects should_filter_tools). Update calls in the places referencing
should_filter_tools / available_tools and harmony_output_to_openai to call this
helper instead.
- Around line 205-223: The current branch opens streamed tool calls for
parser.current_channel == "analysis" but the transition logic in
process_token_batch() and process_token_batch_to_messages() only deactivates
previous tool calls when prev_channel == "commentary", causing reuse/merging of
tool_id across analysis-invocations; update both transition handlers so they
retire (set active=False) prior tool calls when prev_channel is either
"commentary" or "analysis" (i.e., check prev_channel in
("commentary","analysis")) before creating/getting a new tool call via
_get_or_create_tool_call, ensuring parser.current_channel/
current_channel_state/_get_or_create_tool_call don't reuse an old tool_id from
the other channel.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b94c5807-7826-4021-a5ef-b2eedda0d888

📥 Commits

Reviewing files that changed from the base of the PR and between 7a68c42 and efffc01.

📒 Files selected for processing (1)

tensorrt_llm/serve/harmony_adapter.py

tensorrt_llm/serve/harmony_adapter.py

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

dongfengy · 2026-03-09T17:24:39Z

/bot run

tensorrt-cicd · 2026-03-09T17:33:15Z

PR_Github #38304 [ run ] triggered by Bot. Commit: 81f3318 Link to invocation

tensorrt-cicd · 2026-03-09T22:16:40Z

PR_Github #38304 [ run ] completed with state SUCCESS. Commit: 81f3318
/LLM/main/L0_MergeRequest_PR pipeline #29679 completed with status: 'ABORTED'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

JunyiXu-nv

LGTM

JunyiXu-nv · 2026-03-10T03:53:26Z

tensorrt_llm/serve/harmony_adapter.py

            return {"should_stop": "Repeated message"}

+        # Check for tool calls first, regardless of channel.
+        # The model may emit tool calls on either "commentary" or "analysis" channel.


This behavior is kind of violating the harmony definition. https://github.com/openai/harmony/blob/main/docs/format.md#receiving-tool-calls

But if the model is behaving like this, we should WAR to unblock our work.

JunyiXu-nv · 2026-03-10T05:01:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T05:06:53Z

PR_Github #38380 [ run ] triggered by Bot. Commit: 246e773 Link to invocation

tensorrt-cicd · 2026-03-10T12:14:07Z

PR_Github #38380 [ run ] completed with state SUCCESS. Commit: 246e773
/LLM/main/L0_MergeRequest_PR pipeline #29747 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

dongfengy requested a review from JunyiXu-nv March 9, 2026 17:04

dongfengy requested a review from a team as a code owner March 9, 2026 17:05

dongfengy requested a review from syuoni March 9, 2026 17:05

github-actions bot assigned dongfengy Mar 9, 2026

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

tensorrt_llm/serve/harmony_adapter.py Show resolved Hide resolved

tensorrt_llm/serve/harmony_adapter.py Show resolved Hide resolved

tensorrt_llm/serve/harmony_adapter.py Show resolved Hide resolved

tensorrt_llm/serve/harmony_adapter.py Show resolved Hide resolved

dongfengy added 3 commits March 9, 2026 17:21

Fix parser channel issue

eef936d

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

Fix reasoning content missing

92a65bb

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

Harmony streaming drop

81f3318

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

dongfengy force-pushed the user/dongfengy/aa_fix_harmony_main branch from efffc01 to 81f3318 Compare March 9, 2026 17:22

Fix state key for post process worker

246e773

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

JunyiXu-nv approved these changes Mar 10, 2026

View reviewed changes

Conversation

dongfengy commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongfengy commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

JunyiXu-nv left a comment

Choose a reason for hiding this comment

Uh oh!

JunyiXu-nv Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

JunyiXu-nv commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongfengy commented Mar 9, 2026 •

edited

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading